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Abstract 

We consider a class of approximated message passing (AMP) algorithms and characterize their 
high-dimensional behavior in terms of a suitable state evolution recursion. Our proof applies to 
Gaussian matrices with independent but not necessarily identically distributed entries. It covers - 
in particular- the analysis of generalized AMP, introduced by Rangan, and of AMP reconstruction 
in compressed sensing with spatially coupled sensing matrices. 
p j The proof technique builds on the one of [BMll], while simplifying and generalizing several 

• steps. 
■i— > 

a 

1 Introduction 

Approximate message passing (AMP) algorithms [DMM09] apply ideas from graphical models (belief 
propagation [Pea8 8j ) and statistical physics (mean field or TAP equations [M PV87| lMM09j) to 
statistical estimation. In particular AMP applies to problems that do not admit a sparse graphical 
model description. An AMP algorithm takes the form 



in 



(N 



u* = Af(v t ;t)-b t g(u t - 1 ;t-l), (1) 
v t+1 = A T g(u t ;t)-d t f(v t ;t), (2) 



with t G N being the iteration number. Here v t G W 1 , u l G W 71 are vectors that describe the 
.£h algorithm's state, f(-;t) : IR n — > W 1 and g{ - ;t) : K m — > M m are sequences of functions that can be 

^ computed efficiently and b*, d* are scalars that also can be computed given the current state. Finally 

A G W nxn is a matrix that is given as part of the data of the estimation problem. 

One domain in which AMP finds application is the ubiquitous problem of estimating an unknown 

signal x G W 1 from noisy linear observations: 

y = A x + w . (3) 

Here A G M. mxn is a known sensing matrix and w G M m is a noise vector with i.i.d. components with 
Ewi = 0, E{u;?} = cr 2 . In jDMM09j a class of AMP algorithms was developed for this problem in the 
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compressed sensing setting in which x is sparse and m < n. Several generalization -for instance to 
signals with small total variation- were developed in [DJMllaJ, which also provides a more complete 
list of references. All of these generalizations can be recast on the form of Eqs. ([!]), ^ for suitable 
choices of the functions f(-)t) and g(-;t). 

A striking property of AMP algorithms is that their high-dimensional behavior admits an exact 
description. Simplifying, for a broad range of random matrices A, the vectors u*, v l have asymptoti- 
cally i.i.d. Gaussian entries in the limit n, m — > oo at t fixed (see next section for a formal statement). 
The variance of iq, v\ can be computed through a one-dimensional recursion termed state evolution, 
because of its analogy with density evolution in coding theory |RU08| . The predictions of state 
evolution were tested numerically in several papers, see e.g. |DMM09 j 1DMM11 , DJMlfaJ ISchlO| 
IKGR1H lKMS+12al ISS121 1.7M12] . In [BMllj it was proved that state evolution does indeed hold 
if A has i.i.d. Gaussian entries and the functions f(-;t) and g(-;t) are Lipschitz continuous and 
separably This result was extended in |BLM12| to matrices A that have independent non-Gaussian 
entries, under the assumption that functions /( • ; t) and g( ■ ; t) are separable polynomials. On the 
basis of these results, it is natural to conjecture that state evolution holds for matrices with general 
independent entries, whenever /( • ; t) and g(-;t) are separable and locally Lipschitz with polynomial 
growth. This conjecture is still open. 

In this paper we focus on Gaussian matrices and consider a different type of generalization that 
was motivated by the following recent developments. 

Generalized AMP. In [Ranll], Rangan proposed a class of generalized message passing algorithms 
(G-AMP) which found several interesting applications, see |FRVB11] IKBAU12j . In particular, 
generalized AMP allows to tackle nonlinear estimation problems wherein x £ M n is to be 
estimated from observations Y = (Yi, . . . ,Y m ). Observations are conditionally independent 
given A and x, with Yi distributed according to a model p( ■ with = (Ax){. Considering 
for simplicity the case in which p( ■ has a density (denoted again by p), the joint density of 
Y = (Yi, . . . , Y m ) is therefore 

m 

p Y (y\A,x)=Y[p(yi\(Ax)i). (4) 

i=l 

In information theory parlance, the vector (Ax) is passed through a memoryless channel with 
transition probability p(- 1 •). From a statistics point of view, this corresponds to estimation 
of a generalized linear model |NW721 IMN89] . The linear model ([3]) is recovered as the special 
case in which the channel is Gaussian or -more generally- the noise is purely additive. Rangan 
conjectured that suitable state evolution equations hold for G-AMP algorithms as well, without 
however providing a formal proof. 

Spatial coupling. In a separate line of work, Donoho and the present authors [DJMllb] applied 
AMP to compressed sensing reconstruction with spatially coupled sensing matrices. This type 
of sensing matrices were developed in [KMS + 12b] (see also [KPiO] for earlier work in this 
direction) , who demonstrated heuristically the power of this approach. A mathematical analysis 
requires extending state evolution to matrices with independent centered Gaussian entries, 
although with non-identical variances (heteroscedastic entries, in the statistics terminology). 



1 Throughout the paper we say that h : R fc — > R fc is separable if h{x%, X2, ■ • • , Xk) = (hi (xi), ^2(2:2), • ■ • , hk(xk))- 
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More precisely, for A G j^ mxri we assume that the row index set [m] = {1, . . . , m} is partitioned 
into q groups, and that the same holds for the column index set [n] = {1, . . . ,n}. Then the 
entries Ay are independent Gaussian with mean E{Ajj} = and variance E{>1?-} depending on 
the group to which i and j belong. Spatially coupled sensing matrices correspond to a special 
band-diagonal structure of the block variances. 

A rigorous analysis of the implications of state evolution for spatially coupled matrices can 
be found in [DJ Mllbj . In particular, [DJMllb] studied a class of spatially coupled matrices, 
and proved that AMP reconstruction achieves the information-theoretic limit stated in [WV10J. 
More specifically, for sequences of spatially coupled matrices A G u mxn with asymptotic under- 
sampling rate 5 = lim n _ s>00 m/n, AMP reconstructs the signal with high probability, provided 
5 > d(px), where d(px) denotes the (upper) Renyi information dimension of px [Ren59j. 
Further, AMP reconstruction is robust to noise. 

Robust regression. Bean, Bickel, El Karoui and Yu |BBEKY12j recently considered the problem 
of estimating the unknown vector x in the linear model Q using robust regression. They 
developed exact asymptotic expressions for the risk that are analogous to the one proved in 
[BM12] for the Lasso. The results of [B BEKY12] are, on the other hand, based on an heuristic 
derivation. 

The proof in [BM12] was based on the state evolution analysis of a suitable AMP algorithm 
whose fixed points coincide with the Lasso optima. This is suggestive of a possible approach 
for proving the results of [BBEK Y12| : define a suitable AMP algorithm for solving the robust 
regression problem, and analyze it through state evolution. Indeed a comparison of the formulae 
in [BBEKY12] with the state evolution formulae in [Ranll] appears encouraging. 

In this paper we establish a rigorous generalization of state evolution that covers all of the above 
developments. Applications to generalized AMP are already discussed in [RanllJ, and applications 
to spatially coupled sensing matrices can be found in [DJ Mllb] and Section [3j Finally, applications 
to robust regression are left for future study. 

Remarkably, all of the above applications can be derived by treating the following generalization 
of the iteration Q, ([2]). (A formal definition is given in the next section.) 

1. The vectors u* G M m , v t G W 1 are replaced by matrices u* G M mX9 , u* G R nxq , with q kept 
fixed as m, n — > oo. 

2. The functions /, g appearing in Eqs. 0, ^ are now mappings /( • ; t) : W nxq — > W ixq , g(-;t) : 
E mx? —> W mxg that are separable across rows (e.g. the i-th row of f(v;t) only depends on 
the z-th row on v). Correspondingly, the product Af(v l ;t) has to be interpreted as a matrix 
multiplication. 

3. The memory terms are modified with bt, ch replaced by q x q matrices. More specifically, 
btg(u* _1 ;t — 1) and dj f{v l ;t) are respectively replaced by (7(ii' -1 ;i — l)Bj, t)T)J, with 
B t , D 4 G R qxq . 

Our proof uses the technique of [B Mllj . which in turns build on an idea first introduced by 
Bolthausen [Boll2]. A convenient simplification with respect to [BMllj consists in studying a recur- 
sion in which the rectangular matrix A is replaced by a symmetric matrix, and the algorithm state 
is described by a single vector. 
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In section [2] we put forward formal definitions and state our main result for the case of symmetric 
matrices. In section[3]we show how the case of rectangular matrices can be reduced to the symmetric 
one. We also show how our result applies to the case of compressed sensing reconstruction with 
spatially coupled matrices. Finally, we prove our main result in Section [4j 

2 Main result 

We will view AMP as operating on the vector space V Q: n = (R q ) N — R Nxq . Given a vector x G V q n, 
we shall most often regard it as an iV-vector with entries in M q , namely x = (xi, . . . , Xjv), with 
Xj G M q . Components of Xj G M q will be indicated as (xj(l), . . . , Xj(g)) = Xj. For x G V^at, we define 

its norm by ||x|| = (j2f = i ll x il| 2 ) 

Given a matrix A G M. NxN , we let it act on V^tv in the natural way, namely for v',v G V 9i at we 
let v' = Av be given by v- = Ylf=i AijVj for all i G [N\. Here and below [N] = {1, . . . , N} is the set 
of first N integers. In other words we identify A with the Kronecker product A ® l qX q- 

Definition 1. A symmetric AMP instance is a triple (AjJ^jX ) where: 

1. A = G + G T , where G G R NxN has i.i.d. entries Gy ~ N(0, (2A^)- 1 ). 

2. T = {f k : k G [N]} is a collection of mappings f k : R q x N ->■ R q , (x,t) h-> / fc (x, t) that are 
locally Lipschitz in their first argument (and hence almost everywhere differentiable) ; 

3. x° G V q> N is an initial condition. 

Given T = {f k : k G [N]}, we define f(-;t) : V 9i at — > V Qj n by letting v' = f(v;t) be given by 
v' i = / i (v i ;t) for alii e[N]. 

Definition 2. The approximate message passing orbit corresponding to the instance (A, J 7 , x ) is 
the sequence of vectors {x*}t>o, x* G V q ,N defined as follows, for t>0, 

x t+1 = Af(x t ;t)-B t f(x t - 1 ;t-l). (5) 

Here B f : Vq t N — > Vq,N is the linear operator defined by letting, for v' = B t v, 




with -g^- denoting the Jacobian matrix of P(-;t) : R q — > R q . 
2.1 State evolution 

In order to establish the behavior of the sequence {x*}t>o in the high dimensional limit, we need to 
consider a sequence of AMP instances {A(N), J 7 ^, x°' N }n>o indexed by the dimension N. 

Definition 3. We say that the sequence of AMP instances {(A(N), J-jy, x 0,n )}n>q is converging 
if there exists: (i) An integer q; (ii) A function g : R q x ~R q x [q] x N — > M q with g(pc,y,a,t) = 
(51 (x, y, a, t), ■ ■ ■ , <7 g (x, y, a, t)), such that, for each r G [q], a G [q], t G N, g r (- • • ,a,t) is Lipschitz 
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continuous; (Hi) q probability measures P\, . . . , P q on W 1 ; (iv) For each N , a finite partition U 
C2 U • • • U Cq = [N]; (v) q positive definite matrices Ej, . . . , E* G IR 9 * 9 , such that the following 
happens; 

1. For each a G [q], we have limjv^oc, \C^\/N = c a G (0, 1). 

5. For eac/i iV > 0, eac/t a G [q] and each i G C^, u>e /taue / l (x, i) = g(x, yj, a, t). Further, the 
empirical distribution of {yi} ieC N , denoted by P a , converges weakly to P a . 

3. For each a G [q], in probability, 

jv™ 17^1 E 5(x-,yi,a,o) 5 (x°, yi ,a,o) = E° . (7) 

Remark 1. ^4n apparent generalization of the above definition would require the partition to be 
Ci U C2 U • • • U Cq, = [N], while x t G V q ,N , with q 7^ q' ' . It is easy to see that there is no loss of 
generality in assuming q = q' as we do in our definition. Indeed the case q' < q can be reduced to 
our setting by refining the partition arbitrarily, and q' > q by adding dummy coordinates to to the 
variables Xj. 

Remark 2. The function /'( • , • ) depends implicitly on y^. However, the y« 's do not change across 
iterations and so we do not show this dependence explicitly in our notation. 

Our next result establishes that the low- dimensional marginals of {x*} are asymptotically Gaus- 
sian. State evolution characterizes the covariance of these marginals. For each t > 1, state evolution 
defines a positive semidefinite matrix E* G M qxq . This is obtained by letting, for each t > 1 

E* = 5>E<-\ (8) 

6=1 

E* = ¥.[g(ZlY ai a,t)g(ZlY a , ai t) J } , (9) 

for all a G [q]. Here Y a ~ P a , Z l a ~ N (0, E*) and F a and Z* are independent. 

For k > 1 we say a function : R m — > R is pseudo-Lipschitz of order and denote it by cj) G PL(fc) 
if there exists a constant L > such that, for all x, y G M m : 

|<Mx) - <Xy)| < L(l + llxf^ 1 + Hyll*" 1 ) \\x - y\\ . (10) 

Notice that if G PL(k), then there exists a constant V such that for all x G M m : |0(ar)| < 
L'(l + ||x|| fe ). 

Theorem 1. Ze£ (A(A^), J 7 ^, x°)at>o &e a converging sequence of AMP instances, and denote by 
{x*}t>o the corresponding AMP sequence. Suppose further that Ep a (||Y^|| 2fe_2 ) is bounded, and 
E P (M 2fc-2 ) -> Ep a (||y a || 2fc - 2 ) as TV ^ 00, /or some k > 2. T/ien /or a// t > 1, eac/i a G [<?], 
and any pseudo-Lipschitz function ip : K 9 x M 9 — >■ R of order k, we have, almost surely, 



ft* T?L E M>yj) = n^{ziY a )} , (11) 

1 a 1 jec^ 

where Z l a ~ N(0, E*) is independent ofY a ~ P a . 



5 



3 AMP for rectangular and spatially-coupled matrices 



In this section we develop two applications of our main theorem: 

1. We show that AMP iterations with A a rectangular matrix, see e.g. Eqs. ([I]), ([2]), can be recast 
in the form of an iteration with a symmetric matrix A and are therefore covered by Theorem 
[TJ This construction is provided in Section |3.5| (below Proposition [5]) . 

2. We apply the general Theorem [I] to AMP reconstruction in compressed sensing with spa- 
tially coupled matrices. In [DJMllb] . it was proved that, conditionally to a state evolution 
lemma, this approach achieves the information-theoretic limits of compressed sensing set forth 
in [WVlOj . Here we show that our main result Theorem [T] implies the state evolution lemma 
(Lemma 4.1 in [DJMllb ]). 



3.1 General matrix ensemble 

We begin by describing a more general matrix ensemble that encompasses spatially coupled matrices, 
and will be denoted by Ai(W, mo, no). The ensemble depends on two integers mo, no G N, and on a 
matrix with non- negative entries W G M^ xC , whose rows and columns are indexed by the finite sets 
R, C (respectively 'rows' and 'columns'). The matrix is roughly row-stochastic, i.e. 

2 < Wr - C - 2 ' for all r G R . (12) 

We will let |R| = L r and |C| = L c denote the matrix dimensions. The ensemble parameters are 
related to the sensing matrix dimensions by n = noL c and m = moL r . 

In order to describe a random matrix A ~ Ai(W, mo, no) from this ensemble, partition the column 
and row indices of A in -respectively- L c and L r groups of equal size. Explicitly 

[n] = U seC C s , \C S \ = n , 
[m] = U reR R r , \R r \=m . 

Further, if i G R r or j G C s we will write, respectively, r = g(i) or s = g(j). In other words g( • ) is 
the operator determining the group index of a given row or column. 

With this notation we have the following concise definition of the ensemble. 

Definition 4. A random sensing matrix A is distributed according to the ensemble M(W, mo, no) 
(and we write A ~ Ai(W, mo, no) ) if the entries {Aij, i G [m],j G [n]} are independent Gaussian 
random variables with 

^■~ N (°.^W r « WtI 0))- (13) 

See Fig. [I] for a schematic of matrix A. Note that the ensemble A4(W, mo, no) includes, as special 
case, rectangular non-symmetric matrices with i.i.d. entries. 
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Figure 1: Construction of the spatially coupled measurement matrix A for compressive sensing as described 



in Section 3.1 The matrix is divided into blocks with size mo by no- (Number of blocks in each row and 
each column are respectively L c and L ri hence m = m L r , n = n L c ). The matrix elements are chosen 
as N(0, ^ Wg(i),g(j))- ln this figure, Wij depends only on \i — j\ and thus blocks on each diagonal have the 
same variance. 



3.2 AMP for compressed sensing reconstruction 

AMP algorithms were applied in |DJMllbj to compressed sensing reconstruction with spatially 
coupled sensing matrices |KMS + 12b . Here we follow the scheme and notations of [DJMllb] . In 



particular, we assume that the unknown vector x to be reconstructed has entries whose empirical 
distribution converges weakly to a probability measure px over M. The AMP algorithm takes the 
following form (initialized with xj = K px (X) for all i £ [n]): 



x 



r 



.t _ 



= + (Q t QA) T r t ), (14) 

= y - Ax 1 + b* r 1 - 1 . (15) 



Here, for each t, rjt : M. n — > 1" is a differentiate non-linear function that depends on the input 
distribution px- Further, for v £ W 1 , we have r] t (v) = (?7t,i(wi), . . . , m,niv n )) for some functions 
rjij ■ M — > M. The symbol indicates Hadamard (entrywise) product. The specific choices for 



rjt, Q , b are given in Section 3.4 below. 
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3.3 State evolution 

Given W G M+ x roughly row-stochastic, and undersampling rate 5 G (0, 1), the corresponding state 
evolution is defined as follows. Start with initial condition 

tpi (0) = oo for alH G C . (16) 

For all t > 0, a G R, and i G C, let 

itC (17) 
iPi{t + 1) = mmse(2] ^^(t) -1 ] . 

Here and below, mmse(s) denotes the minimum mean square error in estimating X ~ px from a 
noisy observation in Gaussian noise, at signal-to-noise ratio s. Formally, 

mmse(s) =E{[X -E[X\Y}} 2 }, Y = *fsX + Z . 

3.4 Construction of rjt, b*, Q l 

In the constructions for the matrix Q l , the nonlinearities rft, and the vector b*, we use the fact that 
the state evolution sequence can be precomputed. 
Define Q l by 

The nonlinearity % is chosen as follows: 

Vt{v) = (r/t.iCui),^^), • • • ,Vt,N(vN)) , (19) 
where 77^ is the conditional expectation estimator for X ~ px in gaussian noise: 

r, t/l {vi) = E{X | X + Sg^r 1 / 2 ^ = « f } , s r (t) = ^ Wu^uit)- 1 . (20) 

Notice that the function 77^ ( • ) depends on i only through the group index g(i), and in fact para- 
metrically through s g ^(t). We define r)t,i = f]t,u for i G C u . 

Finally, in order to define the vector b*, let us introduce the quantity (with rj' t i denoting the 
derivative of Vi t-t rj t ^{vi)) 



The vector b* is then defined by 



n ° iec u 



where we defined = Q* u for i £ R r , j £ C u . 

The following Lemma (Lemma 4.1 in }DJMllb] ) claims that the state evolution (17) allows an 
exact asymptotic analysis of AMP algorithm (14)- (15) in the limit of a large number of dimensions. 
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Lemma 1. Let W G M^ xC be a roughly row- stochastic matrix and (f>{t), Q l ', b* be defined as in Section 



3.4 Let rriQ = mo(no) be such that mo/no — > 5, as uq —> 00, and let A(n) ~ M(W, mo, no). Further 



suppose that the empirical distribution of the entries of x{n) converges weakly to a probability measure 
px on R with bounded second moment and the empirical second moment of x(n) also converges to 
¥, px (X 2 ). Similarly, suppose that the empirical distribution of the entries of w(n) converges weakly 
to a probability measure pw on R with bounded second moment and the empirical second moment of 
w(n) also converges to K pw (W 2 ) = a 2 . Then, for all t > 1, almost surely we have 

limsup— \\x\j a (A(n)]y(n))-x Ca \\2 = mmse( V ^,a<Mt - 1) _1 ) , (23) 
for all a G C, where x 1 ^ ,xc a G M™ respectively denote the restrictions of x f ,x to indices in C a . 
3.5 Proof of Lemma [T] 

We show that Lemma [T] follows from Theorem [l] Consider the following change of variables: 

x t+1 = x-(Q t Q A) V - x\ (24) 
f* = w-rK (25) 



Rewriting Eqs (14) and (15) in terms of x and f, we obtain 

x t+1 = (Q* A) T (f* -w)- {nt-i(x - x l ) - x}, (26) 
f* = A{r) t -i(x - x*) -x} + b t Q(r t ~ 1 -w). (27) 

Let q = L r + L c and define functions e(-, •, •; t), h(-, ■, •; i) : l ? x i' x [q] — > M. q as follows: 

h{u,w,a;t) = v^M u ( a ) - w(a)) [0^lQo,l> V^a^Ql,L c >*> •••>*] for a G [L r ] , 
e(y,y,a;t) = \fL~ r {fj t -i, a (y(a) - v(o)) - y(a)} [-y/W^, . . . , VW^>*>-- ■ >*] for a G [L c ] . 

In our definition, we do not care about the values of entries represented by *, since they are irrelevant 
for our purposes. Values of h(u, w, a;t) for a G {L r + l,...,L r + L c } and e(v,y, a;t) for a G 
{L c + 1, . . . , L r + £ c } are also irrelevant for our purposes and can be defined arbitrarily. Note that 
h, e G PL(2). We also define function e(-,-;t) : V q . n x V ?jn — > V q ^ n by letting ?/ = e(v,y;t) be given 
by v'j = e(vj,yj,g(j);t) for all j G [«]. Similarly, h(-,-;t) : V q<m x V 9 , m -> V q>m is defined by letting 
u' = h(u, w; t) be given by = h(ui, Wj, g(i); i) for all z G [mj. 

Let A G R mxn be a normalized version of A obtained as in the following: 



A ij = J 1 Ar 



Therefore, ^4 has i.i.d. entries N(0, 1/m). 

Proposition 5. Consider the following approximate message passing orbit with vectors {u*,u*}t>o, 
«* G V„, n , «* G V q . 



q.m ■ 



V 



U 
t+1 



Ae(v\y;t) - B t h(u* -1 ,u;;t - 1) , (28) 
A T h(u t ,w;t)-D t e(v t ,y;t), (29) 
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for given y G V 9in and w G V ? , m . -Here B t : V 9)m — >■ V g>m is the linear operator defined by letting, for 
z' = B t z, and any i G [m], 



\fce[n] / 



(30) 



Analogously Dt : V qi n —> V q , n is the linear operator defined by letting, for z' = DfZ, and any j G [n], 



, _ 1 I yr^ dh . t 

*j - - 2^ ^ 

ye[m] 



w 1} g(0;t) 



(31) 



Assume that y = (yi, . . . , y n ) ; u; = (wi, . . . , w m ), and v 1 = (v{, . . . , v*) are given by 

y k = (*,...,*, ^ ,*,•••,*) G M 9 , VfcG[n] 

position g(k) 
position g(k) 

x% ,*,■■■,*) el', VfcG[n]. 

position g(k) 



Vfc = (*,•••,*, 



Then, we have u*(g(i)) = f| and v* +1 (g(j)) = , /or a// i G [m], j G [n], and t > 0. 
We refer to Section |3.5.1 for the proof of Proposition [5j 

We proceed by constructing a suitable converging sequence of symmetric AMP instances, recog- 
nizing that a subset of the resulting orbit corresponds to the orbit {v l , u 1 } of interest. The converging 
symmetric AMP instances (A S (N), g, x®) are defined as: 

• The instances has dimensions N = m + n and q = L r + L c . 

• Let B 1 = C 1 + Cj and B 2 = C 2 + Cj, where C x G R mxm and C 2 G R nxn have i.i.d. entries 
distributed as N(0, (2m)" 1 ). The symmetric matrix A s is given by 



A* 



Let y s j = Wj G I 5 for i < m and y 



Bt A 

J+l \A T B 2 



s.l — j i—m 



G R q for i > m. 



• The initial condition is given by x® = (x° 1; • • • ,x° N ) G V 9i at, where ~x9 s i = for i < m and 
x s i = v i-m for m < i < m + n. 

• Finally, for any x, y G R q , t > 0, we let 



g(x,y,a,2t) = 

ff(x, y, a, 2i) = 

ff(x, y, a, 2t + 1) = 

g(x,y,a,2t + 1) = 



for a G {1, • • • ,L r }, (32) 
^jp e(x, y, a — L r ; t) for a G {L r + 1, • • • , L r + L c }, (33) 

^ /i(x,y,a;t + l) 



for a G {1, • • • , L r }, (34) 
for a G {L r + 1, • • • ,L r + L c }. (35) 
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Now, it is easy to see that, for all t > 0, 



x 2 *^ 1 = u-, for i < m, 



x 2 *. = v m 



for m + l<i<m + n. 



(36) 
(37) 



Now we are ready to prove Lemma [T] by applying Theorem [T] 

Fix a' G {L r + 1, . . . ,L r + L c } and t > 1. Let a = a' — L r and choose function ^(x, y) 
{??t,a(y(a) - x(a)) - y(a)} 2 . Then, 

lim — V V(xf*-,y s j)= lim — V [r?t,a(y s ,j'(a) - x 2 *-,(a)) - y SJ -,(a)] 2 



no-s>oo no ' 



no-5-oo fjn 



(a) 



Iln-^nfi nn ^ * J 



n ->-oo no 



iec„ 



(6) 



lim — [r?/ j(xj — xl +1 ) 



n ->-oo no 



lim — (x' +1 - x,) 2 = lim — 

n-^fYl n.n — ^ J nn— Von n i 



n ^oo n ' 



n ->-oo no 



| T *+1 _ Tn II 2 



(38) 



Here (a) follows from Eq. ( |37[ ) and the definition of y s j (note that j' = j — m); (b) follows from the 
fact a = g(j) and Proposition [5j 

Applying Theorem [TJ we have almost surely 



n -S>oo no ' 



(39) 



with X ~ px and Z ~ N(0, S 2 ^). Therefore, to complete the proof we need to show that 

(xl t a y 1 = ^w l , a Mt)- 1 . (40) 



Note that Eq. qSJ) reduces to: 



^ b ' m + n 2-^ v 



m + n 



(41) 



<5+i 

5 



b'=l 6'=i r +l 
By definition of function g (see Eq.s (32)- (35)), it is easy to see that Eq. ^ reduces to: 

'0, for a' G [L r ], 

l -L r ^/W~JV~E{r] t ^ a (X - Zl) - X} 2 , for a' G {L r + 1, • • • ,L r + L c },i,j G [L r ], 

otherwise. 

(42) 

Here a = a' - L r , X ~ p x and Z£ ~ N(0, £ 2t a ). Also, 

'¥^r N /W a / ii W / J Q^ i4 Q*, >j {a 2 + S 2 ^, 1 }, for a' G [L r ],i,j G [L c ], 
0, for a' G {L r + 1, • • • , L r + L c }, (43) 

*, otherwise. 



(=5 



2t-l> 
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Consequently, we obtain 



-at = ra 
Jaa m + n 



6=1 



moL r 5 + 1 



m + n 


5 


17l()L r 


S + l 


m + n 


5 


m$L r 


S + l 


m + n 


5 



6=1 

L r L r -\-L 

n 



6=1 
L 



m + n 



E (^~ 2 w 



c'=L r +l 



6=1 ^ c=l J 

E^(QU 2 {^ 2 + lE^ mmse((Sr 2 )- 1 )}. 

6=1 ^ c=l J 



We prove relation (40) using induction on t. The induction basis (t = 0) is trivial. Suppose that 



the claim holds for t — 1. Then, 

L 



-~1t 



E ^(Q6,a) 2 ^ + T E mmSe (E " i)" 1 ) 

6=1 c=l ieR J 

= E^(Q6,a) 2 ^6(t) 
6=1 

= X> V - ^ — ^(*) 
= ^E^^j • 

This proves the induction claim for t. Combining (38), (39) and (40), Lemma [T] follows. 
3.5.1 Proof of Proposition [5] 

We prove the result by induction on t. For t = 0, the claim follows from our definition. Suppose 
that the claim holds for t—1, we prove that for t. 



Writing Eq. (28) for coordinate i, we have 



u* = ^fce(v£,y fc ,g(A:);t) " ^ E fj( v 

ke[n] \ke[n] 



{,y k ,g(k); t) I h(v$-\ Wi , g(i); * - 1) (44) 



12 



Restricting to coordinate g(z), we get 



Ui(g(0) = E i 4 fc[e(v^,y fc ,g(/c);t)] g(i) 

ke[n] 



^ E^(^(s(*))'y*(g(*)).s(*);*)Wo [fc^.Wi.gCOst-i)] 



(45) 



g(fc)- 



Here, we have used the fact that e(v\,, y^, g(k), t) does not depend on for / 7^ g(k). 
Substituting for e and h, we have 



2 ^[ e ( V Lyfc ) g( /c )^)]g(i) = E Ak^/L r Wg(i) ) g(ft){^t_l,g(fc)(yA(g(fc)) - v£(g(fc))) -yjfc(g(fe))} 

fce[n] fee[n] 

= ^ A, fc {r? t _i ifc (a; fc - - (46) 
fee[n] 

where we used the induction hypothesis in the last step. Furthermore, 
\ E[S( v t(g( fc ))'^(gW)'g( fc )^)] g « [AK' _1 ,w <) g(i);t-l)] l 



m L — ' dv 

fce[n 



Jg(fe) 



~ E tl,gW (yfc(8(*)) " vt(g(fc))) y/Zr W g(i)ig(fc) (u*" 1 (g(i)) - Wi (g(i))) y/LrWM* W C%* 



'g(0,g(fe) 



fce[n] 



fee [n] 



(47) 



where we used the induction hypothesis in the second equality. The last equality follows from the 
definition of b\ (see Eq. @); 



Using ( [46] ) and ( |47| ) in ( |45| ), we obtain 

«i(g(*)) = E ^{^uO 1 * - 4) - **} + b *(^ _1 - ^ 

fce[n] 



' 2 ! 



(48) 



where the second equality follows from (27). This proves the induction claim for u'(g(i)). 



Next we prove the claim for v ■ (g(j)). Writing Eq. (29) for coordinate j, we have 

v*+ 1 = £ iyfc(uf,w I} g(Z);*) - ^ E ^( u '. w i.8(0;*) e(v5, yi ,g(i);t) 
ze[m] \/e[m] / 

Restricting to coordinate g(j), we get 

v* +1 (g(i)) = E 4-[Mu|,w,,g(Z);t)] g(i) 
ie[m] 

- - E [^( u *(g(0), w,(g(0), g(0; *) ] g (i) [e(v*-, yi, g(j); *)] g (o- 

ie[m] 



(49) 



(50) 
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Here, we have used the fact that /i(u|,w/,g(Z),t) does not depend on u\ k for k / g(Z). 
Substituting for e and h, we have 



E ^[ /i (u/>Wi,g(Z);t)] g0) = E Ai jy jL r W g{luU) Q| (/ ) )g(i) (uf(g(0) - wj(g(Z))) 
le[m] ie[m] 

J6[m] 

where in the last step we used the result uf(g(Z)) = ff, proved above. Moreover, 

m E [^( u *(g(0) ) w z (g(Z)),g(Z);t)] gO) [e(v*,y J ,g(i);t)] g(0 

ie[m] 

= - E \l ^(0*0-) ^g(o, S (i) ^-i,g(?) (yj (stf )) - v Kg(j))) - yj(g(i))} \J L rW g{ i ))g{j ) 



(51) 



Ze[m 



\t6N / 



) i } 



= rj t ~i,j{xj - £j) - xj. 
Using (51) and (52) in ( p30j ), we obtain 

vf Hg(i)) = E # " ^) " fo-ufo " *5) " *i> = 

Ze[m] 

where the second equality follows from (26). This proves the induction claim for v- +1 (g(i)). 



(52) 
(53) 



4 Proof of Theorem [T] 

4.1 Definitions and notations 

Letting m* = f(x*; t) for t > 0, Eq. becomes 

x m = Am* - Btm*" 1 . (54) 

This is initialized with m^ 1 = and m° = m '^ G V gj 7v, a sequence of deterministic vectors in V gi jv> 
with limsup^r^oo N^ 1 YliLi ll m i > ll 2fc ~ 2 < 00 ■ Also recall that the vectors y = (yi, . . . , yjv) £ V 9) v 
are a fixed sequence indexed by N, with converging empirical distributions. 

The idea of the proof is to study the stochastic process {x°, x , . . . , x f , . . . } taking values in 
Vq t N without conditioning on the matrix A. Instead, for each t, we will compute the conditional 
distribution of x t+1 given x°, . . . , x t , and hence m°, . . . ,m t . More precisely, let &t be the cr-algebra 
generated by these variables. We will compute the conditional distributions x t+1 |e t) by characterizing 
the conditional distribution of the matrix A given this filtration. 

Throughout the proof, we identify V 9j jv with the set of matrices M. Nxq . Adopting this convention, 
the linear operator can be more conveniently identified with the q x q matrix 

B '4(sf(4')). («) 
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We therefore have &t m t-i = m t-i^J an d the equations for x 1 , . . . , x l can be written in matrix form 



as: 



x^x 2 + m°Bj\...\x t + m t ~ 2 Bj_ 1 = A [m [) \ . . . |m i_1 ] . (56) 



t-ii 



Yt- 



M t - 



In short Y t -\ = AM t -\. Here and below we use [Q\P] to denote the matrix obtained by concatenating 
Q and P horizontally. 

We also introduce the notation mjj for the projection of m* onto the column space of M<_i. More 

precisely, mjj G M ArxiJ is the matrix whose columns are the projections of the columns of m t . This 
can be written as 

t-i 

t 



= Y j m i a i , (57) 



m 

i=0 



where G M 9 * 9 , < i < t— 1 contain the coefficients of these projections. Defining by = m* — m| 

the perpendicular component , we have M t "L 1 m^_ = 0. We further denote by a £ M il3X<? the matrix 
obtained by concatenating a^s vertically. Using this notation, we have 

m| = Mt-ia. (58) 

For an integer i > 1, let (£) = {(.£ — l)q + 1, . . . For a matrix u and set of indices /, J, we 

let ui t j denote the submatrix formed by the rows in / and columns in J. We further let uj denote 
the submatrix containing just the rows in /. For v = (vi,...,vjv) G V q ,n and a set of indices 
I = {n, . . . ,i r }, let vi = (v^,.. . , v ir ). 

Given v G V ? , m and </? : — >■ R q , we write 99(f) = (<p(vi), . . . , </?(v m )). We also define Vip(v) = 
[^( v i),---,f^( v rn)] T with§^ £R 5X|) denoting the Jacobian matrix of ip. Note that Vcp{v) £R mqxq . 

For u G M m 9 x 9, let (it) = (1/m) Ei^i M » G M 9 * 9 . Also, for u,v G V g ,Ar we define 



1 N 



iV 
1=1 



Note that (it,u) = (l/N)u T v, as we regard V g ,Ar = R Nxq . 

Given two random variables X, Y, and a cr-algebra (3, the notation X|© = Y means that for any 
integrable function cf> and for any random variable Z measurable on &, E,{c/)(X)Z} = E,{(p(Y)Z}. In 
words we will say that X is distributed as (or is equal in distribution to) Y conditional on 6. In 

case & is the trivial a-algebra we simply write X = Y (i.e. X and Y are equal in distribution). For 
random variables X, Y the notation X a = Y means that X and Y are equal almost surely. 

The large system limit will be denoted as limjv->oo- In the large system limit, we use the notation 
oj(l) to represent a matrix in R tqxq (with t fixed) such that all of its coordinates converge to almost 
surely as N — > 00. 

The indicator function of property A is denoted by 1(A) or I4. The normal distribution with 
mean \x and variance v 2 is represented as N(^, v 2 ). 
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4.2 Main technical Lemma 

We will say that a convergent sequence of mappings (F^NeN is non-trivial if there exists Eq > 
such that, for each N, t > 0, a G [q], i G [N], 7 G M q with ||7|| 2 = 1, b G R, we have 



y (7 T 5 r (x,yi,a,t)-6) 2 dx>e . 



This condition is useful to rule out trivial degeneracies. 

Lemma 2. Le£ {(A(iV), J 7 ^, x '^)}^ be a converging sequence of AMP instances as in Theorem^ 
with J-n non-trivial. Then the following hold for all t G N 

(a) 

t-i 

x t+l \6 t = 5> i+1 a; + Ami + Mt-io t -i(l) , (59) 
i=0 

where A is an independent copy of A. The matrix Mt is such that its columns form an orthog- 
onal basis for the column space of M t and Mj M t = NI tq xtq- Recall that, o t -i(l) G IRC -1 )'?*'? 
is a random vector that converges to almost surely as N — > 00. 

(6) For any pseudo-Lipschitz function <f> : (R. q ) t+2 — >• K of order k, 

N m UW\ £ ^ 1 ,...,x^ +1 ,y J )= E^ 1 ,...,^ 1 ^)]. (60) 

where [Z\, ■ ■ ■ , Z^ +1 ) is a Gaussian vector independent ofY a ^P a and, for each i, Z l a ~ N(0, £*) 

(c) For a// 1 < r, s < t, a G [<?] £/ie following equations hold and all limits exist, are bounded and 
have degenerate distribution (i.e. they are constant random variables): 

lim (x!#,x5#) = lim {m r ,m s ). (61) 

(d) Consider any set of q Lipschitz continuous functions (p a : M q x M q — > M q . For all 1 < r, s < t, 
the following equations hold and all limits exist, are bounded and have degenerate distribution 
(i.e. they are constant random matrices): 

lim (x r c t N \^x s c + N \y cN )) *?■ lim (^.x^XV^^ 1 ,^)) • (62) 

TTie Jacobians here are computed according to the first component. Define tp : V q .n x Vg.TV — > 
Vq,N by letting v' = cp(u,v) be given by v' { = ip a (ui, Vj) for i G C„ . Lei V</? G R^ 9 * 9 6e i/ie 
matrix obtained by concatenating the matrices Vy? a G M' c ' a k x 9^ j or a £ Then, Eq. (62) 
implies that for all 1 < r, s < t ; i/ie following equations hold: 

lim (x r+1 ,<p(x s+1 ,y)}=- lim (x r+1 ,x s+1 )(V^(x s+1 ,y)} . (63) 

iV— s>oo TV— >oo 
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(e) For £ = k — 1 and a € [q], the following holds almost surely 



lun * £ H +l f e <oc. (64) 



(/) For all < r < t the following limit exists and there are positive constants p r (independent of 
N) such that almost surely 



lim (m r ± , m r ± ) - p r l gxq >z . (65) 

N— >oo 



4.2.1 Proof of Theorem [T] 



First assume that the sequence of functions J~n is non-trivial. Theorem [T] follows readily from 
Lemma[2j More specifically, Theorem[l]is obtained by applying Lemma[2j&) to functions 4>(x-j, . . . , x*) 

Consider then the case in which Tn is not non-trivial. In this case we perturb the functions 
g(x,y,a,t) as follows. Let </>(x) : M 9 — > M. q be a bounded smooth function. Define 

g e (x,y,a,t) = g(x,y,a,t) + eip(x). 

The resulting sequence of instances is then non-trivial and state evolution applies. Call S*(e) the 
resulting state evolution sequence, and denote by x t (e) the corresponding orbit. Applying TheoremJIJ 
we have 

1 N 

lim -^^(x*( e ),y i )=E{^(e) ! r o )} ) (66) 

i=l 

with Z*(e) ~ N(0, S (e)). In order to prove the same theorem for the orbit {x*}t>o, we need to show 
the following two facts: 

(i) lim e _ E{V(^(e),F o )} = E{^(2*,y a )}, with ~ N(0,£*). 

(n) Let ajv(e) = Si=i VK 5 ^ 6 )' y*)- Then |ajv(e) — ajv(0)| < Ce, with constant C being indepen- 
dent of TV. 

Given (i)-(ii), we have 
lim |ojv(0) -E{^(Z*,y a )}| < limsup^^jlajvCO) -ajv(e)| + |ajv(e) - E{V(^(e),y a 



V 



|E{^(Z*(e),y a )}-E{^(Z*,y a )}| 

< c e + o + \E{^Zi{e), Y a )} - E{V^, Y a )}\ 



where the last step follows from (ii) and Eq. (66). Therefore, taking the limit of both sides as e — > 0, 
lim \a N (0)-E{^(Zi,Y a )}\ < IimCe + lim |E{^(e),y a )} -E{^,y o )}| =0, 



where the last step follows from (i). This proves Theorem [T] for {x*} 



t>o- 
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It remains to prove facts (i)-(ii). The claim in (i) follows readily by applying dominated conver- 
gence theorem and noting that ip(-, •) is Lipschitz continuous. 
To prove (ii), write 



ajv(e) 



ajv(0)| 



1 N 

i=i 



V 



N 



< £ + ii^wh*" 1 + Hii*- 1 + ini* -1 ) n*) - x * 



< 



i=i 



Ea 

8=1 



Ix*(e)|| fc - 



i TV 

- 1 + Nf- 1 + ||y,|| fc - 1 ) 2 } 5 {Ell x ^)- x *H 2 } 



^ 3L, { 1 + ^Eii x ^)n 2 ^ 2 + 
i=i 



iV 



E 

i=i 



t||2Jfc-2 i 



i=l 
1 * 

iV S 

i=l 



\2k-2 



1 -I N I 

}'{^ElK l ( E )-x,'|l 2 } 5 , 

i=l 



where second inequality holds since ip G PL(fc) and third inequality follows by using Cauchy-Schwartz 
inequality. In the last expression, the term in the first braces is bounded using the assumption on the 



second moment of y and using part (e) of Lemma 



2^ for orbit {x t (e)}. To bound the second braces, 



note that both A and in the AMP iteration d5) have bounded operator norm (the former with 
probability 1 — e - ®^). Since g(- ,t) is Lipschitz continuous and </?(x) is bounded by assumption, 
we conclude that [|a; (e) — x t II 2 < c*iVe 2 for some absolute constant c. This completes the proof of 
fact (ii). 

4.3 Proof of Lemma [2] 



The proof is by induction on t. Let Bt be the property that (59), (60), (61), (63), (64), and (65) 
hold. 



4.3.1 Induction basis: Bq 

Note that x 1 = Am . 

(a) ©o is generated by y, x° and m . Also m° = since M_i is an empty matrix. Hence 

x 1 \q 1 = Am\. 

(b) Let 4> : V 9) 2 — > K be a pseudo-Lipschitz function of order k. Hence, \4>(x)\ < L(l + ||x|| fc ). 
Given m°, y, the random variable Yli£C N 4>([Am°]i,yi) /\C^\ is a sum of independent random 

variables. By Lemma 4^ a) [Am% = Z for Z ~ N(0, (m°,m )). Using Eq. Q, 



Mm <m°,m°> = £ c a Km r^r £ (m») T m» 



£c a E° = E\ 

ae[?] 
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Hence for all p > 1, there exists a constant c p such that E{|| [ylm ]j|| p } < || (m^, m ^) ||| E^H-Z^p < 
c p , with Z ~ N(0,I g ). Next, we check conditions of Theorem [2] for X^,i = (/'(x^yj) — 
E^xJjiJJforoO, 



= |Cfj E E|^ yi )-E^>( X i, yi )}| 2+K 



(67) 



M E Y A ,A{mm\^)-^{[ArnXyi)} 



\C?\ 



^ T^m E |E A A{^([ im0 ]^yO-^([^ ]i>yi)} 



2 + K 



2 + K 



2 + K 



< c+ V ||vll (fc - 1)(2+K) 



<c+LV|C^ f ' £ 



l+re/2 

|2(fc-l) 1 < c "|(7iV|«/2_ 



Here A is an independent copy of A, and the last inequality uses assumption on empirical 
moments of {yi}j g c-iv. By applying Theorem [2j we get 



j™, jcff E t^yO-iM^x!,^)}] = 0. 



Hence, using Lemma [6] for u = u; and ip(yi) = Kz{4>(Z, y%)} we get 

J^^v E E^[0(xj, y< )] =E{0(Z a ,y«)}, 



with Z a ~ N(0, X 1 ) independent of K a ~ P a . Note that ^ belongs to PL(A;) since cf) belongs to 
PL(fc). 

(c) Let A = A c n be the submatrix formed by the rows in Cj[. Using Lemma 4fc), conditioned on 



(d) Write 



lim (xnN,xluf) = lim (Am , Am ) =' lim (m°,m°) = H 1 . 



lim <4^(4„,y C(f )) = lim ^ £ ^[^(xj, y 4 )] T = E(Z a [^(Z a , Y a )] T ), 



tf-oc |Cf I . 
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where the last step follows by applying Bo(b) to the functions 4>(xj,yi) = (l)[ip a (x] , yi)]k, 
for all I, k G [q]. Furthermore, using Lemma [5j 

E(Z a y a (Z a ,Y a )] T ) = ^E([^£(Z a ,Y a )] T ) . 



As proved in part (c), lim7v->oo( 2; ( ^jV) xl^ N ) = E . Also, by part (6), the empirical distribution 
of {(xl, yi)}i £ c N converges weakly to the distribution of (Z a ,Y a ), and consequently we get 



This proves Eq. (62). To prove Eq. (|63j) , notice that 

(x 1 ,ip(x 1 ,y)) = c a(xcN,y(xcN,ycN)) ■ 
a<E[q] 



(68) 



Also, 



lim (x 1 ,^ 1 ) = y^c a lim (xi, N , x\ N ) = c a S x = S 1 , 



Af-s>oo 

ae[g] ae[<j] 



(69) 



where the last step holds since X^ a e[<j] c « = 1- Further, 

(V^s 1 ,*/)) = c a (V<p a {x l c n,y c »)) 

a,e[q] 



(70) 



Combining Eqs. (68), (69), (70) and Eq. (62), we get the desired result. 



(e) Similar to (6), conditioning on m°, the term J2i^c N ll[-^ m °]i|l /\^a\ ^ s sum OI " independent 
random variables and 

E{\\[Am%\n < \\(ml,m° ± )*\\ p 2 E{\\Z\\P} < c p , 
for a constant c p . Therefore, by Theorem [2j we get 



2/ 



E yl {||[Am°] i || 2< } 



0. 



6Cf 



But, J ^ ] E i eC^A{\\[Am%\\^} < ||(m°, m°) 2 1|^ 2 E Z {||Z|| 2 ^} < oo. 
(/) Since t = and m° = mj, the result follows from limj^^ oo (m , m°) = T, 1 and that X 1 

£ 6eM c b s°^o. 
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4.3.2 Proof of B t : 

Suppose that Bt-i holds. We prove Bt- 
(/) It is sufficient to consider r = t. Write = m* — mjj and recall that mf, = ^s=o mSas- 



Hence, for any 70 G M 9 , with [|'yo[| = 1> we have 

7J" (mi, mi) 70 
Note that the matrix 

a = (a , . . ■ , at-i) 



Y (^ m i ~ Y 7o aJmA [ 7( ]m* - ^ 7^ aJmA . 

i=l V s=0 / V s=0 / 



N 



N 



has a finite limit as N — > 00 by the induction hypothesis Bt-i(b). Furthermore, m* = 
g(x|, yj, a, i), for i G C^. By induction hypothesis £>t_i(a), it is sufficient to show that there 
exists p > depending on t such that, 

ijm^nf — ^ 70 5( z + X] a r-i x i , Yi, M) - ^ «J m i • 
i=l \ r=l s=0 / 

/ t-l t-1 \ T 

7o 5( Z + X] a r-l x i 1 yi> a > - X] 7 o" a « - 2/) ' 



(71) 



r=l 



s=0 



where Z = (.Am^ 1 ) T ej G M 9 (ej being the i-the element of the canonical basis). By the strong 
law of large numbers for triangular arrays, the above is lower bounded by 



t-l 



t-l 



i=i 



^fitf 4 J] E A ho 5( z + Y aJ-i x i , Yi, M) - V] 7o «I m ' 

iV— ^00 iV ^ — • L L — • L — * 

r=l s=0 

t-l t-l 
E A 7o5( z + X] a ' r -i x ^ yi ' a '^ ~X^ 7 o a ^ m: 

r=l s=0 
. AT t-l 

> liminf — V Var z (7o"s( z + Y aV-i x i, y i; a, i) 



i=l 



r=l 



The variance in the last expression is taken only with respect to A or, equivalently, with respect 
to Z ~ N(0, (m t ^ 1 ,m t ^ 1 )). Notice that the covariance of Z is lower bounded by p' I gX g for some 
p' > 0, by the induction hypothesis Bt-i(f). It is a straightforward probability exercise to show 
that, for any non-constant continuous function G : M 9 — > M, and any [/ > there exists e > 
such that 

inf Var z (G(a + Z)) > e. 

\\8.\\ 2 <U 

Using Bt~i(e), we can choose U large enough to ensure that there exists at least N/2 values of 
the indices i G [N] such that || Ylt=o Q r-i x i II — U. Note that U and therefore e depend on t 
but do not depend on N. The lower bound (71) follows then by taking p = e/4. 
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a) Let B t G M qxq be given by B t = jj \ J2jelN] ^r( x j>*)J- Further let J be a square block- 
diagonal matrix of size tq with matrices B J, on its diagonal. Define = [a: 1 |a; 2 1 . . . | 



Recalling the definition of Yi_i and Mt-\ from Section 4.1 



Lemma 3. The following holds 



- <+1 |6 t 



Xt^MUMt-i^Mj^ml + Pt lt _ x Am\ + M t _i^_i(l) . 



Proo/. Lemma 10 in jBMllj implies that A\ 6t = E{A\ &t ) +V t (A), where A = A is a random 
matrix independent of &t and Vt is the orthogonal projector onto subspace Vt = {A\AMt-\ = 
0,^4 = A T }. Following the same argument as in [BMll], we have 

E(A\e t ) = y t _i(M^ 1 M t _i)- 1 MT. 1 + M t _i(AfT 1 Af t _i)- 1 y t I 1 
- M t _ i (Ml ! M t _ i ) - 1 Af t T ! y t _ x (M t T _ j M t _ i ) - 1 Ml 1 . 
-Pt(A) = Ph^APh^. 

Using Ml l m t ^_ = and Yt-\ = AMt-i, it is immediate to see that 

A| 6t m* = Yt- 1 (M?_ 1 M t -i)- 1 Ml. 1 m\ + Mt-^Mj^Mt-i^Y^mi + P^Ami . 
Moreover, Y^_ 1 m t ^ = Xj_ 1 m t ^ because Ml 2 m t j_ = 0. Recalling m| = Mf-iot we need to show 
[0|M t _ 2 ]^a + Mt_ i (Mj_ l M t - i ) ~ 1 Xj_ 1 m\ - m^Bj = M t _i5U(l) . (72) 



Note that we used the fact B^m* 1 = m* 1 B^" which follows from our convention V q N = M. Nxq . 

Here is our strategy to prove (72). The left hand side is a linear combination of m , . . . , m t_1 . 
For any ^ = 1, ... , i we will prove that the coefficient of m^ 1 converges to 0. Note that the 
coefficients are matrices of size q. The coefficient of m e ~ 1 in the left hand side is equal to 



(Ml.Mt^r'xl.m 



B 



-at) 



r=l 



t-1 



(£),(r) *=o 



To simplify the notation denote the matrix Mj^Mt-i/N by G. Therefore, 



lim Coefficient of m f 1= lim < >(G 'I^mI 

TV-S-OO /V— vnn I Z— / 'WA>\n 



iV-Kx> 



t-1 



B](-a e ) 1 ^ 



,r=l 



But using the induction hypothesis Bt-\(d) for (p = /(•; 1), ...,/(•; t), the term (x r ,m t — 
Ss=o fn s Oi s ) is almost surely equal to the limit of (x r ,x l )Bj — ^s=o( xr i x s )Bja s . This can be 
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modified, using the induction hypothesis £>j_i(c), to (m r 1 ,m t 1 )B^"— X^s=o( mr 1 ) m ' s 1 )Bja s 
almost surely, which can be written as G^^Hj — X^=o ^(r),(s)^I" a s- Hence, 



t t-i 



lirn^ Coefficient of m^ 1 = ^lim ^ J^G- 1 )^) [G (rUt) Bj - £ G? (r)i(a) Bja a ] - Bj{-a E f^ 



,r=l 

t-1 



* lim {Bjl t=e - y^BjasI^-Bji-ae) 1 ^ 
,s 0. 



Notice that in the above equalities we used the fact that G has, almost surely, a non-singular 
limit as N — > oo which was discussed in part (/). □ 



The proof of Eq. (59) follows immediately since the last lemma yields 



t-i 

Note that, using Lemma Qd), as N — > oo, 

Mt-i {Mj_ 1 M t - i ) ~ 1 Mj_ 1 Am 1 ^ = M t _ 1 o t _ 1 (l) , 

which finishes the proof since M i _iOt_i(l) + M t _io t _i(l) = M t -io t -\(l). 
(c) For r, s < t we can use induction hypothesis. For r = t, s < t, 

t-x t-l 

j=0 i=0 

Now, by induction hypothesis Bt-i(d), for y(v, u) = g(v, u, a, i), each term (m l rN , xp^) has a 



finite limit. Thus, 

t-i 

lim 8i(l)(m l rN ,x s 3) =' 0. 

i=0 



We can use Lemma 



(b)-(d) for ([P^AmWcN^^) to obtain {[P^ t _ x Am\] C N , x s +Jr) 0, 
almost surely. Finally, using induction hypothesis i3 s (c) or £>i(c) for each term of the form 



t-l 

T / 7 s 

to 



lim =' Jim V«7(to* 

j=0 



a.s. 



lim (mu,m s ) =' lim (m t , m s ) 



where the last line uses the definition of a,- and m t , _L m s . 
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For the case of r = s = t, we have 

t-l 

(xJ+Ss^ta = E aj(x^,x^) aj + ([P^Am^cN, [P^Aw^cn) + oi(l). 

i,j=0 

Note that the contribution of all products of the form([P^ im^ jpjv,^!^) almost surely tend 
to 0. Now, using induction hypothesis Bi(c) and Lemmai(c), we obtain 

t-l 

lim (x^,x^)|e t =' lim } aj {m l , m?)otj + lim {m t L ,m t A _) 



iV-»00 



N^oo * — ' TV— »oo 
i,j=0 

lim (mn,mf|)+ lim (mi, mi) 



iV->00 I' II iV->0O 



lim (m , m ). 

AT->oo 

(e) This part follows by a very similar argument to the one in the proof of Lemma 1 (Step Bt(e)) 
in [BMllj . 

(b) Using part (a) we can write 



x 



2 ) ' * * ? ^ 



x 1 X* 

A i ! • • • ) A l 5 



t-l 



E £ r+ V + Ami + 



r=0 



We show that we can drop the error term Mt_xOfr_i(l). Indeed, defining 



Xj , . . . , Xj , 



t-l 

E x r+1 a r + Ami + Mt-iot-i{l) 
r=0 
t-l 

E x r+1 o r + Ami • y> 
r=0 



by the pseudo-Lipschitz assumption 

|^(oi) - <P{h)\ <L(1 + Wa^- 1 + Wbif' 1 ) ( E HmJII ] o(l). 



't-i 



vr=0 



Therefore, using Cauchy-Schwartz inequality twice, we have 

4vt| E ^fo) ~ E ^) 



\cg\ I . 



|2fc-2 



+ 



— V 



t-l 



~ r i|2 

m 



Po(l). (73) 



Also note that 

^ E iKr<(*+inEr^ E k +1 h 2£ +^ E n^n. 
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which is finite almost surely (for I = k — 1) using B r (e) for r E [t] and the assumption on (the 
moment of) y. The term (C^l -1 YlieC N IIM ^ s bounded almost surely since 



m E W hi \ 



2t 



\cz\ . 



< 



< 



C 

c 



t-1 



where the last inequality follows from the fact that [Mj_ 1 Mt-i/N] has almost surely a non- 
singular limit as N — > oo, as discussed in point (/) above. Finally, for r < t — 1, each term 
0-/\Ca\)^2ieC N ll m ill 2£ can be easily proved to be bounded using the induction hypothesis 
Bt-i(e). 

Hence for any fixed t, (73) vanishes almost surely when iV goes to oo. 
Now given, x 1 , . . . , x*, consider the random variables 

Xi = (ft [*k, ■ ■ ■ , x-, a f< +1 + i^±)h Yi\ 

and Xi = Xi — K^{Xi}. Proceeding as in Bo, and using the pseudo-Lipschitz property of (ft, it 
is easy to check the conditions of Theorem [2] We therefore get 



i r t_1 

W\ E <f>U},---,4,[^2x r+1 a r + Am t ± ] i ,y 



r=0 



t-1 



- E^(xJ , . . . , x*, [ V + Ami] v y<) } 



r=0 



0. (74) 



Note that [vlm^Jj is a gaussian random vector with covariance (m^_ , m\ ) . Further (m^_,m^_) 
converges to a finite limit almost surely as N — > oo. Indeed (m^_, = (m t ,m t ) — (m| , wijj). 

By Bt{c), (m*,m*) converges to a finite limit. Further, (mLmj) = ^*"s=0 a r ^ ^ xS ) a s a l so 
converges since the products (x r , x s ) do and the coefficients a r , r < t — 1 as discussed in Bt(f). 

Hence we can use induction hypothesis Bt-i(b) for 

t-i 

£(xj, .. .,x|, yi ) =E^{^>(xJ,. . . ,x*, ^ajx[ +1 + <mi,mi)5Z, yi )} , 



r=0 



with Z ~ N(0, Iqxq) independent of x[ +1 , r < t — 1, to show 



1 a 1 ieC^ 



t-i 



^2 aJ x I +1 + Am\ 

r=0 

^(Zl.-.ZIY, aJZ r a +1 + r t Z, Y a ) } . (75) 



t-i 



EE; 



r=0 
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Note that Ylr=o a J ^a +1 + Z is a gaussian vector. All that we need, is to show that the 
covariance matrix of this gaussian vector is But using a combination of (74) and (75) 



for the pseudo-Lipschitz functions <f>(y\, ■ ■ ■ , v t+ i,y.;) = v t+ i(£)v t+ i(k), for all £, k E [q], 

t-l i-l 



liu, e{ ( »JZ r a +1 + r * Z ) ( E a r Z * +1 + ^ Z ) T }- 



(76) 



r=0 



r=0 



On the other hand as proved in part (c) 



lim (xtT^ar+^) =' Hm (m*,m t )= lim {f{x\t)J{x\t)) 



Hence, 



A' 



| im {x t+l }X t+l } ^ lim 1 ^ f(x ^ )[f(x ^ )]T 



Af-5>oo 



i=l 



E Ca j?4n E 9(x-,yj,a,t)5(x*,yi,a,t) T . 



By induction hypothesis Bt-i(b) for the pseudo-Lipschitz functions 

0(vi, . . .,v t ,yi) = \g(vt,yi,a,t)]e\g(v t ,yi,a,t)]k , 
for all £, k E [g] , we get 

E 9 ^ yi ' a > yi ' a ' =' E {9( zt a,Y a , a, i)#(Z*, Y a , a, t) T } = 



Consequently, 



which proves the claim. 



lim (x^x , x^x) — c a S* — 
ae[?] 



(d) In a very similar manner to the proof of Bo(d), using part (6) for the pseudo-Lipschitz function 
4> ■ V q ,t+2 -> R given by (j)(x}, . . . , x| +1 , y<) = x£ +1 (Z)[</?(xf +1 , y;)] fe , for all Z,fc E [g], we can 
obtain 

lim (afflM'&iVc")) = E(Z: +1 MZ^\Y a )] T ) , 

for gaussian vectors Z^ +1 ~ N(0, S r+1 ), Z* +1 ~ N(0, Using LemmaJH} we have almost 

surely, 

E(zr 1 [^(zr 1 ,y a )] T ) = Cov(zr 1 ,zr 1 )E([^(zr 1 ,y a )] T ). 

By another application of part (b) for 0(x|, . . . , x' +1 , yj) = x£ + (Z)x| + (fc) for all I, k E [g], 

lim x^ 1 ) = Cov(Z£+\ Z* +1 ) . 

Similar to So(d) we also have \im.N^. O0 (Vip a (x s ( ^ I } ,u c n)) = E([^-(Z^ +1 , Y a )] T ), almost surely, 

as the empirical distribution of {(x| + , yi)}i^c N converges weakly to the distribution of (Z* +1 , Y a ). 
This finishes the proof of Eq. (62). 



Eq. (63) follows from Eq. (62) exactly by the same argument as in Bo(d). 
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A Reference probability results 

In this appendix, we summarize a few probability facts that are repeatedly used in the proof of 
Lemma [2j We start by the following strong law of large numbers (SLLN) for triangular arrays 
of independent but not identically distributed random variables. The form stated below follows 
immediately from [HT97, Theorem 2.1]. 

Theorem 2 (SLLN, [HT97]). Let {X n ^ : 1 < i < n, n > 1} be a triangular array of ran- 
dom variables with (X Hj i, . . . ,X nn ) mutually independent with mean equal to zero for each n and 
71 1 SiLi IE|^n,i| 2+K < crW 2 for some < k < 1, c < oo. Then ^ Y^i=i Xi, n — > almost surely for 
n — > oo. 

Next, we present a standard property of Gaussian matrices without proof. This is a generalization 
of [BMlll Lemma 2]. 

Lemma 4. For any deterministic u G Vq,N> v £ V<j,n an d a gaussian matrix A G M nxiV with i.i.d. 
entries l\l(0, 1/N), we have 

(a) [Au]i — (u,u)2z, where z ~ N(0,I gX(? ). 

(b) (Au,v) = (u,u)^(v,z), where z G V q ,n, z i ~ N(0,I gX(? ). 

(c) lim n _ 5 . 00 (^4«, Au) = (u,u) almost surely. 

(d) Consider, for d < n, a d-dimensional subspace W ofM n , an orthogonal basis w±, . . . ,Wd of 
W with ||u>i|| 2 = n for i = l,...,d, and the orthogonal projection Pw onto W. Then for 

D = [wi\ ...\wd], and u G V g .N with (u,u) = l qX q, we have PwAu = Dx where x G Vq t d 
satisfies: lim n _>oo \\x\\ a = 0. (the limit being taken with d fixed). 

Lemma 5 (Stein's Lemma |Stc72j). For jointly gaussian random vectors Z\,Zi G M 9 with zero 
mean, and any function ip : M. q — > M 9 where E{^(Zi)} and K{Z\[ip(Z2)] T } exist, the following holds 

E{Z 1 [ V 9(Z 2 )] T } = Cov(Z 1 ,Z 2 )E{[^(Z 2 )] T }. 

The following law of large numbers is a generalization of [BMlll Lemma 4] and can be proved 
in a very similar manner. 

Lemma 6. Let k > 2 and consider a sequence of vectors {v(N)}n>q in V q .N, whose empiri- 
cal distribution, denoted by p v rm, converges weakly to a probability measure py on M q , such that 
M Pv (\\V\\ k ) < oo. Further assume !Kp v(N) (\\V\\ k ) — > K Pv (\\V\\ k ) as N — > oo. Then, for any pseudo- 
Lipschitz function ip : ~R q — > M. of order k: 

1 N 

lim j.J^v^E^]. (77) 
i=l 
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